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Evaluating Instructional Effectiveness 



ABSTRACT 



An instructor's teaching effectiveness for a partictilar class is assessed by comparing his/her actual evaluation score with 
the one that is predicted for the class rattier than witti the average score for all other instructors teaching the same class. 
The prediction is based on a random-effects regression model of the evaluation score on student, class, and teacher 
characteristics that are not under the control of the instructor. Teaching effectiveness is measured in ttiis manner for a 
student evaluation score and a knowledge-based test score for 49 classes/instructors of principles of economics taught 
at comprehensive universities. 
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1. INTRODUCTION 



I 



Teaching excellence ranks high among universities' list of objectives. The most common approach to assessing teaching 
excellence consists of tpafhing evaluations by students toward die end of the term. They are in wide use across the U.S. 
(Seldin 1989). Student evaluations of teacher performance are likely to receive even more attention in the future as a new 
wave of outcomes assessment is sweeping across U.S. colleges and universities (McCoy et al. 1994). 

Typically, the questionnaire filled out by students to evaluate instructors contains a question on the instructor’s 
overall teaching performance relative to that of other teachers. University administrators tend to make heavy use of 
students' answers to this question because it seemingly cuts through the complexity of teaching evaluations: just one 
munber says it all. It enters into fo rmal assessments from a faculty's merit pay to decisions on tenure and promotion 



(White 1995). 

An instructor's teaching can also be evaluated by identifying how much students have learnt during the semester. 
Compared to student evaluations, this is a considerably more involved task and, therefore, less popular. To examine a 
student's knowledge gain ffiring the term, one would need to assess what studoits know at flie end of the term compared 
to Miat they knew upon entering the class. Some efforts have been made in this respect in econormcs with the Test of 
Understanding in College Economics (TUCE), as developed by the Joint Council on Economic Education. 

One may also think of assessing an instructor’s teaching effectiveness by using a measure that combines both 
student evaluations and a test of knowledge gained, such as the TUCE. This could be a useful route if it is indeed true 
that student evaluations are not significantly correlated with student performance in the course, as suggested, for example, 
by Abrami et al. (1990) or Gramlich and Greenlee (1993). 

Deciding between student evaluations and an outcomes measure, such as the TUCE, is one aspect of die larger 
issue of measuring teaching effectiveness. Generally, discussions stop at this point. The present study also considers 
another aspect of evaluating teaching effectiveness: the standard that is apphed to evaluate an instructor's performance. 
Typically, an instructor's performance score is simply compared to the average score of all instructors. Although this 
method has the benefit of simphcity, it suffers from an apparent defect. Instructors teaching ill prepared students, large 
classes, mandatory courses, difficult material or at inconvenient times are likely to end ig) faring less well than instructors 
facing more favorable conditions in their courses. This type of problem can easily undermine the trust instructors have 
in the whole process of measuring teaching effectiveness. As such, it is clearly incompatible with an incentive structure 
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tbat intends to promote increased productivity.* There is an alternative available: to compare an instructor's performance 
to the score that can reasonably be expected of him/her given the particular conditions that he/she has to deal with in 
his/her class. The key objective of this paper is to illustrate how that can be done. 

The p^ is organized as follows. The next section discusses the methodology. Subsequently, the methodology 
is illustrated with an application to a number of classes that were part of the TUCE ni project. In particular, it is shown 
how instructor rankings can change rather sigmficantly as one switches between alternative performance scores and/or 
between different methods of evaluating a particular performance score. The paper ends with a summary of its main 
points and a discussion of its impUcations for teaching assessment at universities. 

2. METHODOLOGY AND DATA 

The objective of fliis study is to compare instructor rankings for three alternative teaching performance measures: an 
outcomes measure (TUCE), student evaluations of teaching (SET), and a weighted average of the two. For each 
performance measure, two alternative approaches to ranking instructors are presented. The first approach simply 
compares an instructor's score to those of other instructors. The second approach compares an instructor s score to the 
one that can be eiqiected for the instructor's class. 

The first approach to comparing teaching scores is familiar and needs no e^lanation. The second approach 
represents fliis paper's key innovation. The idea is that instructors should be held responsible for a teaching evaluation 
or learning outcomes score only to the ext^t that he/ she can influence it. The impact on the evaluation score of factors 
that are not under the control of the instructor should be ignored for evaluation purposes. 

There are at least two broad groups of factors fliat could impact an instructor's evaluation score without being 
themselves under the control of the instructor: student characteristics, sudi as grade point average or student SAT scores, 
and course characteristics, such as class size. The literature is rich with evidence (hat the type of course as well as student 
characteristics influence studait evaluations (e.g. Aigner and Thum 1986, Langbein 1994, Koon and Murray 1995). For 
example, required core curriculum courses toid to get lower evaluations than electives; the expectation of a good grade 

* Becker (1979) has shown theoretically that rewarding teachers for better teaching has little impact on the teaching 
outcome unless efforts are made simultaneously to improve the accuracy of measuring teaching abiUty. 
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improves evaluations; courses that demand more of a student's time outside of class have lower ratings; smaller classes 
increase student evaluations (Glass, McGaw, and Smith 1981). There is also ample evidence that student and course 
characteristics affect an outcomes measure such as the TUCE (e.g. Lopus and Maxwell 1995). 

To circumvent the problem of attributing to instructors the consequences of conditions they are not responsible 
for, we compare an instructor's actual performance measure with the one that is predicted for his/her class. The 
prediction is based on a regression of actual performance on factors fliat are not under an instructor's control. There are 
numerous such factors. It is helpful to organize them around groiq)s or categories that are reminiscent of the production 
function approach to economic education (Siegfiied and Pels 1979): (i) student characteristics (SC), such as grade point 
average, previous knowledge the student has of the subject matter, SAT or ACT scores, a student's part-time or full-time 
status, (ii) class characteristics (CC), such as class size, meetings per week, time of day of class meetings, and (iii) 
certain teacher characteristics that are not under his/her control (TC), such as years teaching, years teaching the course 
under investigation, terminal degree, English as native language. 

Three steps are involved in removing these factors from an instructor's performance score. First, a regression 
equation is specified that e}q>lains instructors' scores as a function of the above three groups of factors. 

Instructors' scores = f(SC, CC, TC). 

This equation is estimated for a data set that includes comparable classes, such as principles of economics at one 
institution or at several s imilar institutions. Second, the estimated regression is used for prediction purposes. For the 
particular class to be evaluated, class averages are calculated for all regressors and combined with the estimated 
coefficients to yield a predicted value of the instructor's score. Third, this predicted score is compared to flie instructor's 
actual score. A teacher whose actual score exceeds the predicted one is considered to have performed above average or 
above expectations and vice versa. Instructor rankings can be based on the percentage difference between actual and 
predicted scores. 

Since the variables that characterize an instructor's teaching style, effort, and talent are omitted from the 
regressions, the coefficients will be biased to the extent fliat the included variables are correlated with the omitted 
variables. Bias in the coefficients may have an impact on the rankings of instructors. A simple way around this potential 
problem would be to allow the intercept term to vary between classes/instructors. The intercept term would then 
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incorporate all unobservable or unmeasured instructor effects. This points to either a fixed-effects (FE) or a random- 
effects (RE) model. A key advantage of the FE estimator is that its estimators remain consistent even if the instructor 
effects are correlated with the included regressors. Unfortunately, a fully specified FE-model is ruled out for the specified 
model by the type of data that are being analyzed: all class variables (CC) and all teacher variables (TC) are the same 
for all observations of a particxilar class. Hence, the dioice between an FE- and an RE-model has to be made on the basis 
of the variable set SC alone. 

Estimation is done for two performance scores: (i) an outcomes measure of learning (TUCE) and (ii) a measure 
of student evaluations (SET). The data come from the TUCE III data base (Saunders, 1991 and Saunders et al., 1991). 
The data set was assembled during the norming of the test in the fell and spring terms of die 1989-90 academic year. Of 
the 9,768 observations available, one observation per student, the study only utilizes those observations that relate to 
public comprehensive universities. The purpose is to reduce the amount of neglected heterogeneity. This reduced data 
set includes 49 different classes and 848 observations. The definitions of the variables are contained in Table 1. 

3. RESULTS 



In a first step, SET and TUCE are regressed only on student characteristics (SQ using both OLS, FE and RE estimators. 
The top part of Table 2 summarizes the essential test statistics. Incorporating cross-section effects of either the fixed or 
random type raises the R^ appreciably. Both OLS regressions are rejected relative to the FE- and the RE-model 
alternatives at any common level of statistical significance. A Hausman test of the RE versus the FE-model is unable to 
reject the random-effects model in either case. Combining fliese results siaggests that the RE-model is the preferred choice 
for the reduced data set, incorporating only SC. 

The lower part of Table 2 provides the relevant statistics for the regressions on the expanded data set, 
incorporating CC and TC in addition to SC. R^ levels go up for all estimated models. Since it is impossible to estimate 
the FE-model for the expanded data set, one cannot conduct a Hausman test for the RE-model. The ordinary least squares 
model is again rejected at any common level of statistical significance relative to the RE-model. Based on the results of 
Table 2, only the RE-models can be considered viable. 

Table 3 reports for each class the percentage difference between actual TUCE and SET scores and both the 
sample average values and the predicted values from the two RE-models. The 49 classes are ranked according to the 
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percentage by which the actual TUCE score for a class/instructor differs from the sample average TUCE score for the 
49 classes/instructors. The top ranking class/instructor is listed first. Table 3 reveals that above average performance 
according to the TUCE are not necessarily associated with above average performaiKe according to the SET. TUCE and 
SET do not appear to be closely correlated. This applies to the ranking based on average scores as well as to those that 
utilize predicted values from the RE-models. The low correlation of rankings based on TUCE and SET scores is 
underscored by the Spearman rank correlation coefficients reported in Table 4. For all three methods (average and both 
RE-models) the rank correlation is below 0.07 or even slightly negative. All three correlation coefficients are statistically 
insignificant from zero according to the z-scores provided in parenthesis. 

Based on the rank correlation coefficients in Table 4, the RE(SQ model rankings are much closer to those based 
on averages than the r anking s from the RE-model that is estimated for all available variables. The correlation for SET 
scores is particularly close (0.993) between the method based on averages and the one using the reduced variable set for 
prediction purposes (RE-SC). This close association suggests that much of the explanation of SET scores comes from 
the cross-section term vi rather than from the variables in SC. This is confirmed by the low R^ (0.026) for the 
corresponding OLS regression reported in flie upper part of Table 2. The OLS-equation for SET e^qilains somewhat more 
for the complete set of variables (R^ = 0.1 19) and, as a consequence, there is more of a diffirence in rankings between 
this RE-model and the method based on averages (rank correlation = 0.775). The rankings based on equation predictions 
(RE-models) tend to have lower correlations with those based on averages for the TUCE flian for the SET. Again, this 
can be explained by the higher R^ for the TUCE equations than for the SET equations (Table 2). The higher the R^ for 
the prediction equation, the lower is in all likelihood the correlation of rankings with the traditional method based on 
averages. 

Table 3 reveals that st rikin g changes in class/instructor performance evaluation can result as one moves from 
the traditional method based on averages to the suggested one based on predicted values. Let us assume first that the 
TUCE score is used for evaluation purposes. Then classes 7, 8, and 13, for example, would receive high marks since 
the actual TUCE score is considerably above the average TUCE score. Using the prediction method based on the RE 
model with a complete set of variables suggests, by contrast, that the same classes performed below expectations. In 
terms of instructor evaluation, the second method would indicate that the instructor has not done the job that one could 
have expected of him/her given the favorable conditions of the class (good students etc.). There are also numerous classes 
that show bad TUCE results based on the traditional method of averages but better than expected scores based on the 
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prediction equation with the full set of variables. This appUes, for example, to classes 22, 23, 35, and 47. 

■S imil ar discrepancies between performance scores based on averages and based on prediction equations result 
if one assumes that performance is measured by SET scores rather than TUCE scores. For example, classes 3, 15, 29, 
34, and 48 show above average performance based on the traditional method of averages but below expectation scores 
based on the prediction equation (RE - SC, CC, TC). Classes 13, 17, and 46 are examples for low performance 
evaluations based on the traditional method but good evaluations based on the method suggested in this paper. 

Table 5 ranks all 49 classes/instructors for each of the three methods examined in this paper. Three columns 
are provided for each method. The first column ranks classes based on the TUCE score of Table 3. The SET scores 
receive zero weight. Hence, the very first column of Table 5 repeats the ranking in the first results column of Table 3. 
The second column presents r anking s if the TUCE and SET scores of Table 3 receive equal weight. The third column 
r anks classes by the SET score of Table 3, with the TUCE score receiving zero weight. Table 5 illustrates the rank 
correlation results presented in Table 4. It shows the large ranking differences between TUCE and SET regardless of 
the method used. It also clearly reveals the very close correlation of rankings based on SET scores for the traditional 
method based on averages and the prediction method based on the short list of variables (RE-SC). Table 5 shows that 
the class ranked 47th on the basis of its TUCE score by the traditional method of averages, ranks 7th for the RE (SC, 
CC, TC) method. Similarly, the ei^th ranked class for the method of averages moves to rank 37 for the RE (SC, CC, 
TC) method. 

4. SUMMARY AND CONCLUSIONS 

The study has introduced a new method of ranking instructors' teachmg effectiveness. It consists of comparing an 
instructor's actual evaluation score with the one that is predicted for him/her rather than with the average score for all 
other teachers. The prediction is based on a random-effects regression of the evaluation score on student, class, and 
teacher characteristics that are not under the control of the instructor. 

The methodology was appUed to rank 49 classes of principles of economics taught at comprehensive 
universities. Two basic evaluation scores were used: a knowledge-based outcomes test (TUCE) and student evaluations. 
Rankings were derived from these two for a third assessment score, one that puts equal weight on the knowledge-based 
outcomes test score and the teaching evaluation score. Significant differences were found in instructor rankings between 



7 

the three evaluation scores and also between rankings based on the suggested methodology and the traditional approach 



of comparing an instructor's score to the corresponding average score for all instructors. 

The methodology adopted in this paper can be adapted to any university, college, or department to make 
teaching assessment more meaningful. Additional variables fliat may be dioxight of as being important in determining an 
outcomes test or student evaluations but cannot be controlled by die instructor can be easily included in the model. Among 
such additional dete rminant that one may want to include are time-of-day the class is given, a variable that identifies 
whether a class is required for a student, or a variable fliat identifies tibe student as full-time or part-time. What variables 
are ultimately included in tibe prediction equation depends on tibe available data and tibe interests that prevail at a particular 
institution. 

It is apparent that tibere is no need to rely on tibe TUCE as an outcomes measure. Any otiber outcomes test will 
also do. The only requirement for the suggested procedure is that one has data on a sufficient number of different 
classes/instructors before the prediction regression is run. This should not pose a problem in larger departments where 
a dozen or more parallel sections are taught for the same course. Nothing in the methodology limits its applicability to 
economics classes either. It is also not confined to principles classes, although it is more likely that a large number of 
concurrent sections are taught for principles than for other classes. 

If one limits the analysis to student evaluations, the suggested metibodology could be applied across the board 
to all undergraduate and graduate classes regardless of field. Rankings could be generated from a single regression 
equation for the whole institutioa The only potential modification would be the inclusion of dummy variables to account 
for differences in graduate/undergraduate classes, by field of study or by college within the xmiversity. Compared to 
today's practice of dealing with student evaluations, this would make them significantly more comparable and, hence, 
more useful across classes and fields. With the given methodology it would be possible to rank all instructors of the 
university. Top teachers could be identified and rewarded every semester with great ease. There would also be little need 
to collect student answers on dozens of questions. One question on overall teacher performance would suffice as input 
for the above methodology. However, if one has student responses to other questions on the student evaluation form, it 
is of course possible to construct a prediction equation and to rank instructors for each question on the student 
questionnaire for which ranking makes sense. 

Unfortunately, the results of this paper have confirmed earUer evidence by Abrami et al. (1990) and Gramlich 
and Greenlee (1993) that the correlation between instructor rankings based on a knowledge-based outcomes measme, such 
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as the TXJCE, and instructor rankings on the basis of student evaluations is very low. Good teaching evaluations do not 
necessarily mean that students learn a lot. If this result is taken seriously by university administrators, student evaluations 
cannot possibly be used in isolation to assess teaching effectivoiess. They need to be supplemented wifti knowledge-based 
outcomes tests. This paper has offered a way to combine both in one measure of teaching effectiveness. 

Looking toward the practical policy conclusions of this study, it seems that one could divide the path toward 
more useful teaching assessments into two steps. In a first step, the suggested methodology of ranking instructors by 
comparing actual to predicted scores cotild be implemented for teaching evaluations alone. This requires very few 
additional resources and can, therefore, be done almost immediately. In a second step, one would develop knowledge- 
based outcomes tests for each field and course. This would take significantly more time and resources and, for practical 
purposes, this step may be limited to principles classes or other large undergraduate or be ginning graduate classes. Once 
the two steps are complete and all instructors teach at least one of the courses evaluated with an outcomes test, an 
instructor's overall teaching assessment wotild be calculated as a weighted average of the knowledge-based outcomes 
score for one of his/her classes and the SETs from all his/her classes taught during a given term. 
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Table 1. Definition of Variables 
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Variable 

Group 


Variable 


Definition 


Dependent 


TUCE 


number of total correct answers on final test 


Variables 


SET 


student evaluation score 


Student 


GEN 


1 if gender is male 


Character- 


GPA 


cumulative grade point average 


istics 


PRE 


score on TUCE pre-test 


(SC) 


SATACT 
RACE 2 -RACE 4 
GPASAT 


SAT or ACT score - converted to common scale 
race; RACEl = white is base 
GPA * SATACT 


Class 


CLASl 


1 for class size 31 to 40 


Character- 


CLAS2 


1 for class size 41 to 50 


istics 


CLAS3 


1 for class size 51 to 75 ' 


(CC) 


CLAS4 

CLASS 

D2WK 

D3WK 

D4WK 


1 for class size 76 to 100 
1 for class size 101 to 200 
1 for class with two weekly meetings 
1 for class with three weekly meetings 
1 for class with four weekly meetings 


Teacher 


ENG 


1 if instructor's native tongue is English 


Character- 


PHD 


1 if instructor holds doctorate 


istics 


YRSTCH 


years instructor has been teaching 


(TO 


YRSTCHCS 
CLASTRM 
CORSTRM 
CLAST 2 
CORS2 


years instructor has been teaching course 
number of classes taught by instructor per term 
number of courses taught by instructor 
number of classes taught squared 
number of courses taught squared 



Notes: Class size is defined as the sum of the number of observations in each 
course . 
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Table 2. Least Squares, Fixed-Effects, and Random-Effects Models Compared 





SET 




TUCE 




OLS 


FE RE 


OLS 


FE 


RE 


SC variables only 










0.026 

adj . R^ 0.010 


0.287 0.194 

0.232 0.131 


0.447 

0.441 


0.591 

0.559 


0.502 

0.464 


OLS vs. FE 0.000 

RE vs . FE 

OLS vs. RE 0.000 

(p-values ) 


0.482 


0.000 

0.000 




0.240 


SC, CC, TC variables 










R^ 0.119 

adj. R^ 0.092 


na 0.226 

na 0.148 


0.527 

0.513 


na 

na 


0.537 
0.4 91 


OLS vs. RE 0.000 

(p-values ) 




0.000 






Notes: OLS stands for 
a fixed-effects, RE 


ordinary least squares 
a random-effects model. 


without group 
A low p-value. 


effects; 
such as 


FE indicates 
0 . 000, means 



the null hypothesis is rejected. 
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Table 3. Actual Score as a Percentage of Average and Predicted Scores 



Percentage of Predicted Score 



Percentage of 



Class No. 


Average 


Score 


TUCE 


SET 


1 


59.0 


-13.8 


2 


55.1 


8.4 


3 


51.2 


9.7 


4 


40.5 


6.9 


5 


38.7 


-4.0 


6 


28.9 


-14.5 


7 


28.3 


1.8 


8 


27 . 9 


8.4 


9 


25.6 


-10.8 


10 


25.4 


13.1 


11 


22.1 


3.2 


12 


21.9 


-19.0 


13 


20.6 


-9.5 


14 


15.7 


13.4 


15 


15.2 


7.1 


16 


12.2 


12.8 


17 


3.9 


-6.9 


18 


3.1 


5.1 


19 


3.0 


-9.9 


20 


1.6 


-0.1 


21 


1 . 4 


-0.0 


22 


0.0 


-4.9 


23 


-1 . 9 


6.8 


24 


-2.8 


-4.0 


25 


-3.1 


-35.1 


26 


-4.0 


10.9 


27 


-4 . 4 


-12.1 


28 


-4.4 


13.1 


29 


-5.0 


9.7 


30 


-6.0 


-3.5 


31 


-6.4 


-24.6 


32 


-9.0 


5.0 


33 


-9.4 


-22.1 


34 


-9.5 


5.3 


35 


-9.5 


-10.3 


36 


-11.2 


-11.0 


37 


-12.8 


4.2 


38 


-13.9 


10.6 


39 


-14.7 


14.6 


40 


-15.1 


3.5 


41 


-16.9 


-4.6 


42 


-17.3 


-1.4 


43 


-18.0 


-12.7 


44 


-20.0 


-21.6 


45 


-20.2 


-9.5 


46 


-21.7 


-6.7 


47 


-21.8 


0.4 


48 


-25.7 


6.8 


49 


-27.2 


17 . 4 



RE (SC) 



TUCE 


SET 


32.6 


-10.3 


25.1 


12.3 


13.5 


10.6 


13.6 


8.6 


29.0 


-3.7 


23.2 


-14.1 


14.5 


2.6 


19.3 


9.2 


5.6 


-9.3 


16.6 


13.5 


9.6 


3.4 


1.7 


-18.7 


10.0 


-8.4 


0.1 


13.7 


18.8 


6.8 


13.8 


13.0 


14.1 


-4.8 


5.5 


10.1 


0.6 


-8.0 


-2.6 


1.2 


-17 . 0 


0.3 


5.6 


-4.0 


-3.2 


6.5 


-3.6 


1.9 


-8.1 


-33.0 


. -4.8 


11.3 


-0.3 


-10.1 


-4.4 


13.1 


-2.8 


9.9 


-1.5 


-3.6 


-0.2 


-23.4 


-14.3 


5.4 


-4.4 


-22.2 


-8.8 


6.2 


-10.1 


-9.3 


-5.7 


-10.9 


-13.2 


4.3 


-14.2 


11.9 


-13.0 


17.7 


-8.5 


4.0 


-14.5 


-3.8 


-8.1 


-1.6 


-14.7 


-10.7 


-17 . 4 


-21.4 


-15.6 


-9.0 


-19.4 


-5.7 


11.5 


2.3 


-14.7 


6.9 


-21.7 


18.5 



RE (SC, 


CC, TC) 


TUCE 


SET 


22.0 


-10.2 


20.9 


21.3 


6.8 


-2.4 


12.2 


1.2 


11.4 


1.2 


1.5 


-3.8 


-5.2 


-0.5 


-6.4 


2.6 


19.1 


-6.1 


14 . 4 


19.7 


. 5.5 


-0.9 


8.0 


-10.1 


-0.1 


3.9 


3.6 


10.0 


10.3 


-1.8 


13.7 


7 . 9 


19.9 


3.7 


-3.1 


2.4 


1.5 


-3.3 


-4.4 


2.4 


-5.6 


5.4 


12.2 


-12.0 


10.9 


8.5 


2.2 


1.5 


-8.4 


-25.4 


-1.6 


6.4 


-4.6 


-8.5 


-6.8 


14 . 9 


-13.0 


-1.5 


-0.4 


-6.8 


-0.5 


-23.8 


-8.0 


3.7 


-3.4 


-23.6 


-6.0 


-4.1 


5.9 


-3.9 


-7.5 


-9.8 


-4.5 


4.7 


-8.5 


2.8 


-14.8 


15.8 


-0.7 


2.4 


-1.5 


-2.0 


-2.0 


-3.4 


-7 . 5 


-5.8 


-16.8 


-9.3 


-0.4 


-4.3 


-12.2 


6.7 


13.5 


-0.8 


-7 . 4 


-2.1 


-20.3 


10.9 



Note: RE stands for random-effects model. (SC) indicates that only the variables 
in set SC of Table 1 were used for the prediction equation. (SC, CC, TC) means 
that all variables of Table 1 were used for prediction purposes. Classes are 
ranked according to performance as measured by the percentage difference between 
actual and sait^le average score (first results column). 
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Table 4. Spearman Rank Correlation Coefficients 



Average 


RE 


(SC) 


RE (SC, 


CC, TO 


TUCE SET 


TUCE 


SET 


TUCE 


SET 



Average 

TUCE 

SET 


0.056 

(0.38) 


0.850 

(5.89) 


0.993 

(6.88) 


0.616 

(4.27) 


0.775 

(5.37) 


RE (SC) 
TUCE 

SET 






0.067 

(0.46) 


0.722 

(5.00) 


0.782 

(5.42) 


RE (SC, CC, TC) 
TUCE 










-0.016 

(-0.11) 



Notes: Numbers in parenthesis are z-scores . A value above 1.64 in 
absolute terms identifies statistical significance at better than the 
five percent level. 
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Table 5. Class/Instructor Rankings for Alternative Methods: 

Average versus Prediction Equations and Alternative TUCE Weights 





Average 


RE (SC) 




RE 1 


;sc, cc, Tc 


TUCE Weight: 1 


0.5 


0 


1 


0.5 


0 


1 


0.5 


1 


2 


49 


1 


2 


49 


1 


2 


2 


3 


39 


5 


10 


39 


2 


10 


3 


4 


14 


2 


8 


14 


17 


17 


4 


1 


28 


6 


16 


10 


9 


16 


5 


10 


10 


8 


15 


28 


10 


23 


6 


8 


16 


15 


5 


16 


16 


14 


7 


5 


26 


10 


3 


2 


47 


4 


8 


7 


38 


7 


1 


38 


4 


9 


9 


14 


29 


17 


4 


26 


22 


47 


10 


11 


3 


16 


7 


3 


5 


5 


11 


16 


8 


4 


18 


18 


23 


1 


12 


15 


2 


3 


14 


29 


15 


15 


13 


9 


15 


47 


47 


8 


12 


28 


14 


6 


4 


13 


11 


4 


3 . 


26 


15 


13 


23 


11 


17 


4 8 


35 


11 


16 


28 


48 


22 


6 


15 


11 


3 


17 


18 


34 


9 


28 


23 


14 


13 


18 


26 


18 


18 


29 


34 


24 


24 


19 


23 


32 


12 


26 


32 


19 


35 


20 


29 


37 


19 


39 


37 


6 


40 


21 


12 


40 


14 


23 


40 


13 


39 


22 


20 


11 


31 


22 


11 


45 


37 


23 


21 


7 


27 


13 


7 


30 


22 


24 


39 


47 


30 


20 


47 


31 


21 


25 


17 


21 


20 


24 


24 


40 


18 


26 


38 


20 


29 


38 


20 


41 


19 


27 


32 


42 


23 


34 


21 


26 


20 


28 


34 


30 


24 


49 


42 


42 


12 


29 


22 


24 


33 


9 


30 


18 


6 


30 


24 


5 


28 


40 


5 


33 


41 


31 


19 


41 


26 


30 


41 


20 


8 


32 


37 


22 


36 


19 


22 


37 


32 


33 


30 


4 6 


42 


48 


17 


27 


45 


34 


49 


17 


25 


37 


4 6 


7 


42 


35 


40 


13 


40 


32 


19 


21 


46 


36 


27 


45 


34 


42 


13 


34 


38 


37 


42 


19 


35 


27 


45 


8 


7 


38 


48 


35 


39 


36 


35 


28 


30 


39 


35 


9 


37 


21 


9 


48 


49 


40 


47 


36 


38 


12 


27 


43 


48 


41 


41 


27 


32 


41 


1 


36 


34 


42 


36 


43 


41 


35 


43 


32 


27 


43 


46 


1 


48 


31 


36 


25 


43 


44 


45 


6 


43 


45 


6 


38 


29 


45 


43 


12 


45 


46 


12 


46 


36 


46 


31 


44 


21 


43 


44 


29 


31 


47 


33 


33 


44 


33 


33 


39 


44 


48 


25 


31 


46 


44 


31 


44 


33 


49 


44 


25 


49 


25 


25 


49 


25 


Note: Classes are ranked from best to worst 


. The numbers refer to 


the class 


numbers identified in Table 


3, which are identical 


to the ones in 


the first 



column of this table. 
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') 

0 

2 

10 

39 

28 

49 

14 

23 

16 

46 

26 

21 

37 

13 

32 

17 

38 

8 

20 

40 

18 

24 

5 

4 

7 

47 

11 

29 

15 

41 

48 

3 

19 

42 

6 

35 

34 

45 

43 

9 

30 

27 

44 

36 

12 

1 

22 

33 

31 

25 
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